Intel/DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound-gptq-inc

Model Details

This model is an int4 model with group_size 128 and symmetric quantization of deepseek-ai/DeepSeek-R1-0528-Qwen3-8B generated by intel/auto-round algorithm.

Please follow the license of the original model.

How To Use

INT4 Inference on CPU/Intel GPU/CUDA

from transformers import AutoModelForCausalLM, AutoTokenizer

quantized_model_dir = "Intel/DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound-gptq-inc"

model = AutoModelForCausalLM.from_pretrained(
    quantized_model_dir,
    torch_dtype="auto",
    device_map="auto",
)

tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True)
prompts = [
    "9.11和9.8哪个数字大",
    "How many e in word deepseek",
    "There are ten birds in a tree. A hunter shoots one. How many are left in the tree?",
]

texts = []
for prompt in prompts:
    messages = [
        {"role": "user", "content": prompt}  ##change this to align with the official usage
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    texts.append(text)
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)

outputs = model.generate(
    input_ids=inputs["input_ids"].to(model.device),
    attention_mask=inputs["attention_mask"].to(model.device),
    max_length=512,  ##change this to align with the official usage
    num_return_sequences=1,
    do_sample=False  ##change this to align with the official usage
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs["input_ids"], outputs)
]

decoded_outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

for i, prompt in enumerate(prompts):
    input_id = inputs
    print(f"Prompt: {prompt}")
    print(f"Generated: {decoded_outputs[i]}")
    print("-" * 50)

"""
Prompt: 9.11和9.8哪个数字大
Generated: <think>
首先，用户的问题是：“9.11和9.8哪个数字大？”这是一个比较两个数字大小的问题。

数字是9.11和9.8。9.11是九点一一，9.8是九点八。

在比较小数时，我们需要考虑小数点后的位数。9.11有两位小数，9.8有一位小数，但我们可以将它们视为相同的小数位数来比较。

9.8可以写成9.80，这样就有两位小数了。所以，9.80和9.11。

现在，比较9.80和9.11。整数部分相同，都是9。所以，我们需要比较小数部分。

小数部分：9.80的十分位是8，百分位是0；9.11的十分位是1，百分位是1。

从十分位开始比较：8 vs 1，8大于1，所以9.80大于9.11。

因此，9.8大于9.11。

用户可能是在考虑日期或版本号，但问题明确是“数字”，所以应该作为数值来比较。

在数值上，9.8是9.80，而9.11是9.11，所以9.80 > 9.11。

确认一下：9.8 = 9 + 8/10 = 9.80，而9.11 = 9 + 1/10 + 1/100 = 9.11。

是的，9.80比9.11大。

如果用户是在考虑字符串比较，比如版本号，那么9.11可能被视为9.11，而9.8被视为9.8，但在版本号比较中，通常是从左到右比较数字部分。

例如，在软件版本中，9.8可能被视为9.8.0，所以比较9.11和9.8.0。

9.11的主版本是9，次版本是11；9.8的主版本是9，次版本是8。

所以，主版本相同，次版本11大于8，因此9.11大于9.8。

但问题中说的是“数字
--------------------------------------------------
Prompt: How many e in word deepseek
Generated: <think>
First, the user asked: "How many e in word deepseek". I need to count the number of 'e's in the word "deepseek".

Let me write down the word: D-E-E-P-S-E-E-K.

Now, I'll go through each letter and count the 'e's.

- D: not e

- E: e (first one)

- E: e (second one)

- P: not e

- S: not e

- E: e (third one)

- E: e (fourth one)

- K: not e

So, there are four 'e's in "deepseek".

I should confirm the spelling. "Deepseek" – is that the correct spelling? I think it's a company name or something, but based on the query, it's given as "deepseek", so I'll assume that's the word.

Also, the user said "e in word", which might be a typo or shorthand for "e's in the word". Probably, they meant "how many e's are in the word deepseek".

Now, to respond, I should be clear and direct. Since this is a simple count, I can just state the number.

But let me make sure there's no trick here. Is "deepseek" meant to be "DeepSeek" or something else? The query says "deepseek", so I'll treat it as a lowercase word for counting purposes.

Perhaps it's a reference to the AI model DeepSeek, but I think for this question, it's just about the word itself.

So, counting: positions with 'e' are 2nd, 3rd, 6th, and 7th letters. That's four.

I can also think of it as the word has 8 letters, and 'e' appears at indices 2,3,6,7 if we start counting from 1.

But for the response, I don't need to be too technical.

Finally, I should phrase my response to be helpful and engaging, but since the question is straightforward, a simple answer might suffice.

Response: There are 4 'e's in the word "deepseek".
</think>
The word "deepseek" contains **4** instances of the letter "e". 

To break it down:
- The word is spelled: D-E-E-P-S-E
--------------------------------------------------
Prompt: There are ten birds in a tree. A hunter shoots one. How many are left in the tree?
Generated: <think>
First, the question is: "There are ten birds in a tree. A hunter shoots one. How many are left in the tree?"

This seems like a riddle. The straightforward answer might be nine, but I think there's a trick here. Riddles often play on words or assumptions.

Let me read it carefully. "There are ten birds in a tree." Then, "A hunter shoots one." And it asks how many are left.

Now, when a hunter shoots a bird, does that mean the bird dies and falls out of the tree? Or does it fly away? That's a common point in these kinds of riddles.

The classic version of this riddle is: "There are 10 birds in a tree. A hunter shoots one. How many are left?" And the answer is zero because the sound of the gunshot scares the others away.

But in this case, it's similar. Let me confirm the wording. "A hunter shoots one." It doesn't specify if the shot is heard or if the birds are scared.

However, in the standard riddle, the implication is that the shot scares the remaining birds, so they fly away, leaving none.

But here, it just says "shoots one," not necessarily that the shot is loud or that the birds are scared. Maybe I should consider the context.

Perhaps it's about the birds being alive or dead. But the question is about how many are left in the tree, not how many are alive.

Another angle: maybe the hunter is shooting at the birds, but not necessarily hitting them or causing them to leave. But that seems less likely.

Let's think step by step.

Initially, there are ten birds in the tree.

Hunter shoots one. What happens to that one? It's shot, so probably it dies or falls down.

But the question is about how many are left, meaning remaining in the tree.

If the shot kills the bird, it might fall from the tree, so it's no longer in the tree.

Then, the other nine might be scared and fly away.

But the riddle doesn't say that the shot scares the birds; it just says the hunter shoots one.

In many versions, the key is that the shot scares the birds, so they all fly away.

But let's see if there's another interpretation.

Perhaps the birds are not real birds; maybe it's a
--------------------------------------------------
"""

vLLM usage

from vllm import LLM, SamplingParams

prompts = [
    "Hello, my name is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)  ##change this to match official usage
model_name = "Intel/DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound-gptq-inc"
llm = LLM(model=model_name, tensor_parallel_size=1)

outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Evaluate the model

auto-round --eval  --model "Intel/DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound-inc" --eval_bs 16  --tasks leaderboard_ifeval,leaderboard_mmlu_pro,gsm8k,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,cmmlu,ceval-valid

Metric	BF16	INT4(auto-round)	INT4 (auto-round-best)
Avg	0.5958	0.5913	0.5926
arc_challenge	0.5137	0.5102	0.5043
arc_easy	0.7908	0.7862	0.7921
boolq	0.8498	0.8526	0.8443
ceval-valid	0.7296	0.7177	0.7140
cmmlu	0.7159	0.7029	0.7027
gsm8k	0.8211	0.8029	0.8234
hellaswag	0.5781	0.5703	0.5670
lambada_openai	0.5544	0.5490	0.5626
leaderboard_ifeval	0.2731	0.2729	0.2542
leaderboard_mmlu_pro	0.4115	0.4105	0.4117
openbookqa	0.3020	0.3060	0.3100
piqa	0.7617	0.7617	0.7612
truthfulqa_mc1	0.3562	0.3611	0.3696
winogrande	0.6835	0.6740	0.6788

Reproduce the model

Here is the sample command to reproduce the model

auto-round
--model_name deepseek-ai/DeepSeek-R1-0528-Qwen3-8B \
--device 0 \
--format "auto_gptq,auto_awq,auto_round" \
--enable_torch_compile \
--output_dir "./tmp_autoround"

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

Intel
/

DeepSeek-R1-0528-Qwen3-8B-int4-AutoRound-gptq-inc